Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher.
Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?
Some links on this page may take you to non-federal websites. Their policies may differ from this site.
-
Free, publicly-accessible full text available June 8, 2026
-
Long-horizon tasks in unstructured environments are notoriously challenging for robots because they require the prediction of extensive action plans with thousands of steps while adapting to ever-changing conditions by reasoning among multimodal sensing spaces. Humans can efficiently tackle such compound problems by breaking them down into easily reachable abstract sub-goals, significantly reducing complexity. Inspired by this ability, we explore how we can enable robots to acquire sub-goal formulation skills for long-horizon tasks and generalize them to novel situations and environments. To address these challenges, we propose the Zero-shot Abstract Sub-goal Framework (ZAS-F), which empowers robots to decompose overarching action plans into transferable abstract sub-goals, thereby providing zero-shot capability in new task conditions. ZAS-F is an imitation-learning-based method that efficiently learns a task policy from a few demonstrations. The learned policy extracts abstract features from multimodal and extensive temporal observations and subsequently uses these features to predict task-agnostic sub-goals by reasoning about their latent relations. We evaluated ZAS-F in radio frequency identification (RFID) inventory tasks across various dynamic environments, a typical long-horizon task requiring robots to handle unpredictable conditions, including unseen objects and structural layouts. Ourexperiments demonstrated that ZAS-F achieves a learning efficiency 30 times higher than previous methods, requiring only 8k demonstrations. Compared to prior approaches, ZAS-F achieves a 98.3% scanning accuracy while significantly reducing the training data requirement. Further, ZAS-F demonstrated strong generalization, maintaining a scan success rate of 99.4% in real-world deployment without additional finetuning. In long-term operations spanning 100 rooms, ZAS-F maintained consistent performance compared to short-term tasks, highlighting its robustness against compounding errors. These results establish ZAS-F as an efficient and adaptable solution for long-horizon robotic tasks in unstructured environments.more » « lessFree, publicly-accessible full text available April 28, 2026
-
This project introduces a framework to enable robots to recognize human hand signals, a reliable and feasible device-free means of communication in many noisy environments such as construction sites and airport ramps, to facilitate efficient human-robot collaboration. Various hand signal systems are accepted in many small groups for specific purposes, such as Marshalling on airport ramps and construction site crane operations. Robots must be robust to unpredictable conditions, including various backgrounds and human appearances, an extreme challenge imposed by open environments. To address these challenges, we propose Instant Hand Signal Recognition (IHSR), a learning-based framework with world knowledge of human gestures embedded, for robots to learn novel hand signals in a few samples. It also offers robust zero-shot generalization to recognize learned signals in novel scenarios. Extensive experiments show that our IHSR can learn a novel hand signal in only 50 samples, which is 30+ times more efficient than the state-of-the-art method. It also demonstrates a robust zero-shot generalization for deploying a learned model in unseen environments to recognize hand signals from unseen human users.more » « less
-
The fast development in Deep Learning (DL) has made it a promising technique for various autonomous robotic systems. Recently, researchers have explored deploying DL models, such as Reinforcement Learning and Imitation Learning, to enable robots for Radio-frequency Identification (RFID) based inventory tasks. However, the existing methods are either focused on a single field or need tremendous data and time to train. To address these problems, this paper presents a Cross-Modal Reasoning Model (CMRM), which is designed to extract high-dimension information from multiple sensors and learn to reason from spatial and historical features for latent crossmodal relations. Furthermore, CMRM aligns the learned tasking policy to high-level features to offer zero-shot generalization to unseen environments. We conduct extensive experiments in several virtual environments as well as in indoor settings with robots for RFID inventory. The experimental results demonstrate that the proposed CMRM can significantly improve learning efficiency by around 20 times. It also demonstrates a robust zero-shot generalization for deploying a learned policy in unseen environments to perform RFID inventory tasks successfully.more » « less
An official website of the United States government

Full Text Available